17 research outputs found

    Hardware implementations of fixed-point Atan2

    Get PDF
    International audience—The atan2 function computes the polar angle arctan(x/y) of a point given by its cartesian coordinates. It is widely used in digital signal processing to recover the phase of a signal. This article studies for this context the implementation of atan2 with fixed-point inputs and outputs. It compares the prevalent CORDIC shift-and-add algorithm to two multiplier-based techniques. The first one reduces the bivariate atan2 function to two functions of one variable: the reciprocal, and the arctangent. These two functions may be tabulated, or evaluated using bipartite or polynomial approximation methods. The second technique directly uses piecewise bivariate polynomial approximations, in degree 1 and degree 2. It requires larger tables but has the shortest latency. Each of these approaches requires a relevant argument reduction, which is also discussed. All the algorithms are described with the same accuracy target (faithful rounding) and implemented with similar care in an open-source library. Based on synthesis results on FPGAs, their relevance domains are discussed

    L'arithmétique sur le tas

    Get PDF
    National audienceOn appelle un tas de bits une somme non évaluée de variable binaires, chacune pondérée par une puissance de 2. Par exemple, tous les polynômes à plusieurs variables peuvent s'exprimer comme un tas dont chaque variable est un ET logique des bits d'entrée. Cette représentation est pertinente car elle exprime le parallélisme au niveau du bit. La littérature sur les multiplieurs binaires montre comment construire des architectures efficaces qui calculent la valeurd'un tas de bits. Le présent article montre l'intérêt de revisiter un certain nombre d'opérateurs arithmétiques composés pour les exprimer comme des tas de bits

    Fixed-Point Implementations of the Reciprocal, Square Root and Reciprocal Square Root Functions

    Get PDF
    Implementations of the reciprocal, square root and reciprocal square root often share a common structure. This article is a survey and comparison of methods (with only slight variations for the three cases) for computing these functions. The comparisons are made in the context of the same accuracy target (faithful rounding) and of an arbitrary fixed-point format for the inputs and outputs (precisions of up to 32 bits). Some of the methods discussed might require some form of range reduction, depending on the input's range. The objective of the article is to optimize the use of fixed-size FPGA resources (block multipliers and block RAMs). The discussions and conclusions are based on synthesis results for FPGAs. They try to suggest the best method to compute the previously mentioned fixed-point functions on a FPGA, given the input precision. This work compares classical methods (direct tabulation, multipartite tables, piecewise polynomials, Taylor-based polynomials, Newton-Raphson iterations). It also studies methods that are novel in this context: the Halley method and, more generally, the Householder method

    Automating the pipeline of arithmetic datapaths

    Get PDF
    International audienceThis article presents the new framework for semi-automatic circuit pipelining that will be used in future releases of the FloPoCo generator. From a single description of an operator or datapath, optimized implementations are obtained automatically for a wide range of FPGA targets and a wide range of frequency/latency trade-offs. Compared to previous versions of FloPoCo, the level of abstraction has been raised, enabling easier development, shorter generator code, and better pipeline optimization. The proposed approach is also more flexible than fully automatic pipelining approaches based on retiming: In the proposed technique, the incremental construction of the pipeline along with the circuit graph enables architectural design decisions that depend on the pipeline. These allow pipeline-dependent changes to the circuit graph for finer optimization

    Pipeline automatique d’opérateurs dans FloPoCo 5.0

    Get PDF
    National audienceCet article présente la troisième évolution de la fonctionnalité de pipeline automatique intégrée dans le générateur de coeurs arithmétiques FloPoCo. La description combinatoire en VHDL d'un opérateur de calcul est d'abord encapsulée dans du C++. Ensuite, l'ajout de primitives simples à ce C++ permet d'obtenir automatiquement des versions pipelinées de ce VHDL, à une fréquence arbitraire et pour toute la gamme de cible supportées par l'outil. Les algorithmes utilisés sont décrits dans les grandes lignes, et la qualité des résultats est évaluée

    Sum-of-Product Architectures Computing Just Right

    Get PDF
    International audienceMany digital filters and signal-processing transforms can be expressed as a sum of products with constants (SPC). This paper addresses the automatic construction of low-precision, but high accuracy SPC architectures: these architectures are specified as last-bit accurate with respect to a mathematical definition. In other words, they behave as if the computation was performed with infinite accuracy, then rounded only once to the low-precision output format. This eases the task of porting double-precision code (e.g. Matlab) to low-precision hardware or FPGA. The paper further discusses the construction of the most efficient architectures obeying such a specification, introducing several architectural improvements to this purpose. This approach is demonstrated in a generic, open-source architecture generator tool built upon the FloPoCo framework. It is evaluated on Finite Impulse Response filters for the ZigBee protocol

    Arithmetic core generation using bit heaps

    Get PDF
    International audienceA bit heap is a data structure that holds the unevaluated sum of an arbitrary number of bits, each weighted by some power of two. Most advanced arithmetic cores can be viewed as involving one or several bit heaps. We claim here that this point of view leads to better global optimization at the algebraic level, at the circuit level, and in terms of software engineering. To demonstrate it, a generic software framework is introduced for the definition and optimization of bit heaps. This framework, targeting DSP-enabled FPGAs, is developed within the open-source FloPoCo arithmetic core generator. Its versatility is demonstrated on several examples: multipliers, complex multipliers, polynomials, and discrete cosine transform

    Opérateurs grossiers haute performance pour l'informatique basée FPGA

    No full text
    Field-Programmable Gate Arrays (FPGAs) have been shown to sometimes outperform mainstream microprocessors. The circuit paradigm enables efficient application-specific parallel computations. FPGAs also enable arithmetic efficiency: a bit is only computed if it is useful to the final result. To achieve this, FPGA arithmetic shouldn’t be limited to basic arithmetic operations offered by microprocessors. This thesis studies the implementation of coarser operations on FPGAs, in three main directions: New FPGA-specific approaches for evaluating the sine, cosine and the arctangent have been developed. Each function is tuned for its context and is as versatile and flexible as possible. Arithmetic efficiency requires error analysis and parameter tuning, and a fine understanding of the algorithms used. Digital filters are an important family of coarse operators resembling elementary functions: they can be specified at a high level as a transfer function with constraints on the signal/noise ratio, and then be implemented as an arithmetic datapath based on additions and multiplications. The main result is a method which transforms a high-level specification into a filter in an automated way. The first step is building an efficient method for computing sums of products by constants. Based on this, FIR and IIR filter generators are constructed. For arithmetic operators to achieve maximum performance, context-specific pipelining is required. Even if the designer’s knowledge is of great help when building and pipelining an arithmetic datapath, this remains complex and error-prone. A user-directed, automated method for pipelining has been developed. This thesis provides a generator of high-quality, ready-made operators for coarse computing cores, which brings FPGA-based computing a step closer to mainstream adoption. The cores are part of an open-ended generator, where functions are described as high-level objects such as mathematical expressions.Les FPGA (Field Programmable Gate Arrays) constituent un type de circuit reprogrammable qui, sous certaines conditions, peuvent avoir de meilleures performances que les microprocesseurs classiques. Les FPGA utilisent le circuit comme paradigme de programmation, ce qui permet d'effectuer des calculs parallèles propres à l'application visée. Ils permettent aussi d’atteindre l’efficacité arithmétique: un bit ne doit être calculé que s'il est utile dans le résultat final. Pour ce faire, l’arithmétique utilisée par les FPGA ne peut se limiter qu’à des fonctions conçues pour les microprocesseurs. Cette thèse se propose d’étudier les méthodes pour l’implémentation des fonctions gros-grain pour les FPGA à travers trois voies. De nouvelles méthodes pour évaluer des fonctions trigonométriques, telles que le sinus, cosinus et arc tangente ont été développés dans cette thèse. Chaque méthode est optimisée dans son contexte, de la manière la plus flexible et la plus souple possible. Pour que les méthodes aboutissent à leur efficacité arithmétique, il est nécessaire de procéder à une analyse d'erreurs, ainsi qu’à un choix attentif des paramétrés de la méthode et à une fine compréhension des algorithmes utilisés. Les filtres numériques constituent une famille importante d’opérateurs arithmétiques qui rassemble des fonctions élémentaires. Ils peuvent être spécifiés à un niveau élevé d'abstraction, à travers une fonction de transfert avec des contraintes sur le rapport signal/bruit. Ils peuvent être ensuite implémentés comme des chemins de données basés sur des additions et des multiplications. Le principal résultat est donc une méthode qui transforme une spécification de haut niveau en une implémentation d’une façon automatique. La première étape se rapporte au développement d'une méthode pour le calcul des produits par des constantes. Des filtres FIR et IIR peuvent être construits à l'aide de cette brique de base. Pour que les opérateurs arithmétiques atteignent leur performance maximale, on a besoin d’un pipeline correspondant au contexte donné. Même si les connaissances du développeur s’avèrent d’un grand avantage pendant le processus de création d'un pipeline d'un chemin de données, cette étape demeure complexe et facilement susceptible à des erreurs. Une méthode automatique, contrôlée par le développeur a dont été développée. Cette thèse fournit un générateur des opérateurs arithmétiques de haute qualité près à l'emploi, et qui propagent le domaine des calculs sur des FPGA à un pas plus proche de l’adoption générale. Les cœurs arithmétiques font partie d'un générateur open-source, où les fonctions peuvent être décrites par une spécification de haut niveau, comme par exemple une formule mathématique

    Automating the pipeline of arithmetic datapaths

    No full text
    International audienceThis article presents the new framework for semi-automatic circuit pipelining that will be used in future releases of the FloPoCo generator. From a single description of an operator or datapath, optimized implementations are obtained automatically for a wide range of FPGA targets and a wide range of frequency/latency trade-offs. Compared to previous versions of FloPoCo, the level of abstraction has been raised, enabling easier development, shorter generator code, and better pipeline optimization. The proposed approach is also more flexible than fully automatic pipelining approaches based on retiming: In the proposed technique, the incremental construction of the pipeline along with the circuit graph enables architectural design decisions that depend on the pipeline. These allow pipeline-dependent changes to the circuit graph for finer optimization

    Fixed-Point Trigonometric Functions on FPGAs

    Get PDF
    Three approaches for computing sines and cosines on FP-GAs are studied in this paper, with a focus of highthroughput pipelined architecture, and state-of-the-art implementation techniques. The first approach is the classical CORDIC iteration, for which we suggest a reduced iteration technique and fine optimizations in datapath width and latency. The second is an ad-hoc architecture specifically designed around trigonometric identities. The third uses a generic table- and DSP-based polynomial approximator. These three architectures are implemented and compared in the FloPoCo framework. 1
    corecore